Text-independent speaker recognition using non-linear frame likelihood transformation
نویسندگان
چکیده
When the reference speakers are represented by Gaussian mixture model (GMM), the conventional approach is to accumulate the frame likelihoods over the whole test utterance and compare the results as in speaker identi®cation or apply a threshold as in speaker veri®cation. In this paper we describe a method, where frame likelihoods are transformed into new scores according to some non-linear function prior to their accumulation. We have studied two families of such functions. First one, actually, performs likelihood normalization ± a technique widely used in speaker veri®cation, but applied here at frame level. The second kind of functions transforms the likelihoods into weights according to some criterion. We call this transformation weighting models rank (WMR). Both kinds of transformations require frame likelihoods from all (or subset of all) reference models to be available. For this, every frame of the test utterance is input to the required reference models in parallel and then the likelihood transformation is applied. The new scores are further accumulated over the whole test utterance in order to obtain an utterance level score for a given speaker model. We have found out that the normalization of these utterance scores also has the eect for speaker veri®cation. The experiments using two databases ± TIMIT corpus and NTT database for speaker recognition ± showed better speaker identi®cation rates and signi®cant reduction of speaker veri®cation equal error rates (EER) when the frame likelihood transformation was used. Ó 1998 Elsevier Science B.V. All rights reserved.
منابع مشابه
Frame level likelihood normalization for text-independent speaker identification using Gaussian mixture models
In this paper we propose a new speaker identi cation system, where the likelihood normalization technique, widely used for speaker veri cation, is introduced. In the new system, which is based on Gaussian Mixture Models, every frame of the test utterance is inputed to all the reference models in parallel. In this procedure, for each frame, likelihoods from all the models are available, hence th...
متن کاملIncorporating MAP estimation and covariance transform for SVM based speaker recognition
In this paper, we apply Constrained Maximum a Posteriori Linear Regression (CMAPLR) transformation on Universal Background Model (UBM) when characterizing each speaker with a supervector. We incorporate the covariance transformation parameters into the supervector in addition to the mean transformation parameters. Maximum Likelihood Linear Regression (MLLR) covariance transformation is adopted....
متن کاملDiscriminative training of GMM using a modified EM algorithm for speaker recognition
In this paper, we present a new discriminative training method for Gaussian Mixture Models (GMM) and its application for the text-independent speaker recognition. The objective of this method is to maximize the frame level normalized likelihoods of the training data. That is why we call it the Maximum Normalized Likelihood Estimation (MNLE). In contrast to other discriminative algorithms, the o...
متن کاملSpeaker Identification for Whispered Speech Using a Training Feature Transformation from Neutral to Whisper
A number of research studies in speaker recognition have recently focused on robustness due to microphone and channel mismatch(e.g., NIST SRE). However, changes in vocal effort, especially whispered speech, present significant challenges in maintaining system performance. Due to the mismatch spectral structure resulting from the different production mechanisms, performance of speaker identifica...
متن کاملAcoustic analysis and feature transformation from neutral to whisper for speaker identification within whispered speech audio streams
Whispered speech is an alternative speech production mode from neutral speech, which is used by talkers intentionally in natural conversational scenarios to protect privacy and to avoid certain content from being overheard or made public. Due to the profound differences between whispered and neutral speech in vocal excitation and vocal tract function, the performance of automatic speaker identi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Speech Communication
دوره 24 شماره
صفحات -
تاریخ انتشار 1998